deep relu net
Generalization Performance of Empirical Risk Minimization on Over-parameterized Deep ReLU Nets
Lin, Shao-Bo, Wang, Yao, Zhou, Ding-Xuan
In this paper, we study the generalization performance of global minima for implementing empirical risk minimization (ERM) on over-parameterized deep ReLU nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove that there exist perfect global minima achieving almost optimal generalization error bounds for numerous types of data under mild conditions. Since over-parameterization is crucial to guarantee that the global minima of ERM on deep ReLU nets can be realized by the widely used stochastic gradient descent (SGD) algorithm, our results indeed fill a gap between optimization and generalization.
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- (3 more...)
Depth Selection for Deep ReLU Nets in Feature Extraction and Generalization
Han, Zhi, Yu, Siquan, Lin, Shao-Bo, Zhou, Ding-Xuan
Deep learning is recognized to be capable of discovering deep features for representation learning and pattern recognition without requiring elegant feature engineering techniques by taking advantage of human ingenuity and prior knowledge. Thus it has triggered enormous research activities in machine learning and pattern recognition. One of the most important challenge of deep learning is to figure out relations between a feature and the depth of deep neural networks (deep nets for short) to reflect the necessity of depth. Our purpose is to quantify this feature-depth correspondence in feature extraction and generalization. We present the adaptivity of features to depths and vice-verse via showing a depth-parameter trade-off in extracting both single feature and composite features. Based on these results, we prove that implementing the classical empirical risk minimization on deep nets can achieve the optimal generalization performance for numerous learning tasks. Our theoretical results are verified by a series of numerical experiments including toy simulations and a real application of earthquake seismic intensity prediction.
- Asia > China > Liaoning Province > Shenyang (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > New York (0.04)
- (2 more...)
Realization of spatial sparseness by deep ReLU nets with massive data
Chui, Charles K., Lin, Shao-Bo, Zhang, Bo, Zhou, Ding-Xuan
--The great success of deep learning poses urgent challenges for understanding its working mechanism and rationality. The depth, structure, and massive size of the data are recognized to be three key ingredients for deep learning. In this paper, we aim at rigorous verification of the importance of massive data in embodying the out-performance of deep learning. T o approximate and learn spatially sparse and smooth functions, we establish a novel sampling theorem in learning theory to show the necessity of massive data. We then prove that implementing the classical empirical risk minimization on some deep nets facilitates in realization of the optimal learning rates derived in the sampling theorem. This perhaps explains why deep learning performs so well in the era of big data. With the rapid development of data mining and knowledge discovery, data of massive size are collected in various disciplines [50], including medical diagnosis, financial market analysis, computer vision, natural language processing, time series forecasting, and search engines. These massive data bring additional opportunities to discover subtle data features which cannot be reflected by data of small size while creating a crucial challenge on machine learning to develop learning schemes to realize benefits by exploring the use of massive data. Although numerous learning schemes such as distributed learning [26], localized learning [32] and sub-sampling [14] have been proposed to handle massive data, all these schemes focused on the tractability rather than the benefit of massiveness. Therefore, it remains open to explore the benefits brought from massive data and to develop feasible learning strategies for realizing these benefits. Deep learning [18], characterized by training deep neural networks (deep nets for short) to extract data features by using rich computational resources such as computational power of modern graphical processor units (GPUs) and custom processors, has made remarkable success in computer vision [23], speech recognition [24] and game theory [40], practically showing its power in tackling massive data. C.K. Chui is also associated with the Department of Statistics, Stanford University, CA 94305, USA. Shao-Bo Lin is with the Center of Intelligent Decision-making and Machine Learning, School of Management, Xi'an Jiaotong University, Xi'an, China.
- Asia > China > Shaanxi Province > Xi'an (0.44)
- North America > United States (0.24)
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Leisure & Entertainment > Games (0.34)
- Health & Medicine > Diagnostic Medicine (0.34)